propagation module
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- Asia > China > Liaoning Province > Dalian (0.04)
LGRPool: Hierarchical Graph Pooling Via Local-Global Regularisation
Noravesh, Farshad, Haffari, Reza, Soon, Layki, Pal, Arghya
Hierarchical graph pooling(HGP) are designed to consider the fact that conventional graph neural networks(GNN) are inherently flat and are also not multiscale. However, most HGP methods suffer not only from lack of considering global topology of the graph and focusing on the feature learning aspect, but also they do not align local and global features since graphs should inherently be analyzed in a multiscale way. LGRPool is proposed in the present paper as a HGP in the framework of expectation maximization in machine learning that aligns local and global aspects of message passing with each other using a reg-ularizer to force the global topological information to be inline with the local message passing at different scales through the representations at different layers of HGP . Experimental results on some graph classification benchmarks show that it slightly outperforms some baselines.
- Asia > Malaysia (0.04)
- Oceania > Australia (0.04)
- North America > United States (0.04)
PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model
Liu, Shang-Ching, Tran, Van Nhiem, Chen, Wenkai, Cheng, Wei-Lun, Huang, Yen-Lin, Liao, I-Bin, Li, Yung-Hui, Zhang, Jianwei
Affordance understanding, the task of identifying actionable regions on 3D objects, plays a vital role in allowing robotic systems to engage with and operate within the physical world. Although Visual Language Models (VLMs) have excelled in high-level reasoning and long-horizon planning for robotic manipulation, they still fall short in grasping the nuanced physical properties required for effective human-robot interaction. In this paper, we introduce PAVLM (Point cloud Affordance Vision-Language Model), an innovative framework that utilizes the extensive multimodal knowledge embedded in pre-trained language models to enhance 3D affordance understanding of point cloud. PAVLM integrates a geometric-guided propagation module with hidden embeddings from large language models (LLMs) to enrich visual semantics. On the language side, we prompt Llama-3.1 models to generate refined context-aware text, augmenting the instructional input with deeper semantic cues. Experimental results on the 3D-AffordanceNet benchmark demonstrate that PAVLM outperforms baseline methods for both full and partial point clouds, particularly excelling in its generalization to novel open-world affordance tasks of 3D objects. For more information, visit our project site: pavlm-source.github.io.
Learning Affinity via Spatial Propagation Networks
Sifei Liu, Shalini De Mello, Jinwei Gu, Guangyu Zhong, Ming-Hsuan Yang, Jan Kautz
In this paper, we propose spatial propagation networks for learning the affinity matrix for vision tasks. We show that by constructing a row/column linear propagation model, the spatially varying transformation matrix exactly constitutes an affinity matrix that models dense, global pairwise relationships of an image. Specifically, we develop a three-way connection for the linear propagation model, which (a) formulates a sparse transformation matrix, where all elements can be outputs from a deep CNN, but (b) results in a dense affinity matrix that effectively models any task-specific pairwise similarity matrix.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- Asia > China > Liaoning Province > Dalian (0.04)
Dynamic Graph Node Classification via Time Augmentation
Sun, Jiarui, Gu, Mengting, Yeh, Chin-Chia Michael, Fan, Yujie, Chowdhary, Girish, Zhang, Wei
Node classification for graph-structured data aims to classify nodes whose labels are unknown. While studies on static graphs are prevalent, few studies have focused on dynamic graph node classification. Node classification on dynamic graphs is challenging for two reasons. First, the model needs to capture both structural and temporal information, particularly on dynamic graphs with a long history and require large receptive fields. Second, model scalability becomes a significant concern as the size of the dynamic graph increases. To address these problems, we propose the Time Augmented Dynamic Graph Neural Network (TADGNN) framework. TADGNN consists of two modules: 1) a time augmentation module that captures the temporal evolution of nodes across time structurally, creating a time-augmented spatio-temporal graph, and 2) an information propagation module that learns the dynamic representations for each node across time using the constructed time-augmented graph. We perform node classification experiments on four dynamic graph benchmarks. Experimental results demonstrate that TADGNN framework outperforms several static and dynamic state-of-the-art (SOTA) GNN models while demonstrating superior scalability. We also conduct theoretical and empirical analyses to validate the efficiency of the proposed method. Our code is available at https://sites.google.com/view/tadgnn.
- North America > United States > Illinois (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
ScribbleBox: Interactive Annotation Framework for Video Object Segmentation
Chen, Bowen, Ling, Huan, Zeng, Xiaohui, Jun, Gao, Xu, Ziyue, Fidler, Sanja
Manually labeling video datasets for segmentation tasks is extremely time consuming. In this paper, we introduce ScribbleBox, a novel interactive framework for annotating object instances with masks in videos. In particular, we split annotation into two steps: annotating objects with tracked boxes, and labeling masks inside these tracks. We introduce automation and interaction in both steps. Box tracks are annotated efficiently by approximating the trajectory using a parametric curve with a small number of control points which the annotator can interactively correct. Our approach tolerates a modest amount of noise in the box placements, thus typically only a few clicks are needed to annotate tracked boxes to a sufficient accuracy. Segmentation masks are corrected via scribbles which are efficiently propagated through time. We show significant performance gains in annotation efficiency over past work. We show that our ScribbleBox approach reaches 88.92% J&F on DAVIS2017 with 9.14 clicks per box track, and 4 frames of scribble annotation.
- North America > Canada > Ontario > Toronto (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- (3 more...)
Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking
Crawford, Eric, Pineau, Joelle
The ability to detect and track objects in the visual world is a crucial skill for any intelligent agent, as it is a necessary precursor to any object-level reasoning process. Moreover, it is important that agents learn to track objects without supervision (i.e. without access to annotated training videos) since this will allow agents to begin operating in new environments with minimal human assistance. The task of learning to discover and track objects in videos, which we call \textit{unsupervised object tracking}, has grown in prominence in recent years; however, most architectures that address it still struggle to deal with large scenes containing many objects. In the current work, we propose an architecture that scales well to the large-scene, many-object setting by employing spatially invariant computations (convolutions and spatial attention) and representations (a spatially local object specification scheme). In a series of experiments, we demonstrate a number of attractive features of our architecture; most notably, that it outperforms competing methods at tracking objects in cluttered scenes with many objects, and that it can generalize well to videos that are larger and/or contain more objects than videos encountered during training.
- North America > Canada > Quebec > Montreal (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Learning Affinity via Spatial Propagation Networks
Liu, Sifei, Mello, Shalini De, Gu, Jinwei, Zhong, Guangyu, Yang, Ming-Hsuan, Kautz, Jan
In this paper, we propose a spatial propagation networks for learning affinity matrix. We show that by constructing a row/column linear propagation model, the spatially variant transformation matrix constitutes an affinity matrix that models dense, global pairwise similarities of an image. Specifically, we develop a three-way connection for the linear propagation model, which (a) formulates a sparse transformation matrix where all elements can be the output from a deep CNN, but (b) results in a dense affinity matrix that is effective to model any task-specific pairwise similarity. Instead of designing the similarity kernels according to image features of two points, we can directly output all similarities in a pure data-driven manner. The spatial propagation network is a generic framework that can be applied to numerous tasks, which traditionally benefit from designed affinity, e.g., image matting, colorization, and guided filtering, to name a few. Furthermore, the model can also learn semantic-aware affinity for high-level vision tasks due to the learning capability of the deep model. We validate the proposed framework by refinement of object segmentation. Experiments on the HELEN face parsing and PASCAL VOC-2012 semantic segmentation tasks show that the spatial propagation network provides general, effective and efficient solutions for generating high-quality segmentation results.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- Asia > China > Liaoning Province > Dalian (0.04)